{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "cFFW8D-weE2S"
      },
      "source": [
        "# Tutorial: Serializing LLM Pipelines\n",
        "\n",
        "- **Level**: Beginner\n",
        "- **Time to complete**: 10 minutes\n",
        "- **Components Used**: [`HuggingFaceLocalGenerator`](https://docs.haystack.deepset.ai/v2.0/docs/huggingfacelocalgenerator), [`PromptBuilder`](https://docs.haystack.deepset.ai/v2.0/docs/promptbuilder)\n",
        "- **Prerequisites**: None\n",
        "- **Goal**: After completing this tutorial, you'll understand how to serialize and deserialize between YAML and Python code."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "DxhqjpHfenQl"
      },
      "source": [
        "## Overview\n",
        "\n",
        "**📚 Useful Documentation:** [Serialization](https://docs.haystack.deepset.ai/v2.0/docs/serialization)\n",
        "\n",
        "Serialization means converting a pipeline to a format that you can save on your disk and load later. It's especially useful because a serialized pipeline can be saved on disk or a database, get sent over a network and more. \n",
        "\n",
        "Although it's possible to serialize into other formats too, Haystack supports YAML our of the box to make it easy for humans to make changes without the need to go back and forth with Python code. In this tutorial, we will create a very simple pipeline in Python code, serialize it into YAML, make changes to it, and deserialize it back into a Haystack `Pipeline`."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "9smrsiIqfS7J"
      },
      "source": [
        "## Preparing the Colab Environment\n",
        "\n",
        "- [Enable GPU Runtime in Colab](https://docs.haystack.deepset.ai/docs/enabling-gpu-acceleration#enabling-the-gpu-in-colab)\n",
        "- [Set logging level to INFO](https://docs.haystack.deepset.ai/docs/log-level)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "TLaHxdJcfWtI"
      },
      "source": [
        "## Installing Haystack\n",
        "\n",
        "Install Haystack 2.0 Beta with `pip`:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "CagzMFdkeBBp",
        "outputId": "e304450a-24e3-4ef8-e642-1fbb573e7d55"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Requirement already satisfied: haystack-ai in /usr/local/lib/python3.10/dist-packages (2.0.0b5)\n",
            "Requirement already satisfied: boilerpy3 in /usr/local/lib/python3.10/dist-packages (from haystack-ai) (1.0.7)\n",
            "Requirement already satisfied: haystack-bm25 in /usr/local/lib/python3.10/dist-packages (from haystack-ai) (1.0.2)\n",
            "Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from haystack-ai) (3.1.3)\n",
            "Requirement already satisfied: lazy-imports in /usr/local/lib/python3.10/dist-packages (from haystack-ai) (0.3.1)\n",
            "Requirement already satisfied: more-itertools in /usr/local/lib/python3.10/dist-packages (from haystack-ai) (10.1.0)\n",
            "Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from haystack-ai) (3.2.1)\n",
            "Requirement already satisfied: openai>=1.1.0 in /usr/local/lib/python3.10/dist-packages (from haystack-ai) (1.10.0)\n",
            "Requirement already satisfied: pandas in /usr/local/lib/python3.10/dist-packages (from haystack-ai) (1.5.3)\n",
            "Requirement already satisfied: posthog in /usr/local/lib/python3.10/dist-packages (from haystack-ai) (3.3.3)\n",
            "Requirement already satisfied: pyyaml in /usr/local/lib/python3.10/dist-packages (from haystack-ai) (6.0.1)\n",
            "Requirement already satisfied: tenacity in /usr/local/lib/python3.10/dist-packages (from haystack-ai) (8.2.3)\n",
            "Requirement already satisfied: tqdm in /usr/local/lib/python3.10/dist-packages (from haystack-ai) (4.66.1)\n",
            "Requirement already satisfied: typing-extensions in /usr/local/lib/python3.10/dist-packages (from haystack-ai) (4.9.0)\n",
            "Requirement already satisfied: anyio<5,>=3.5.0 in /usr/local/lib/python3.10/dist-packages (from openai>=1.1.0->haystack-ai) (3.7.1)\n",
            "Requirement already satisfied: distro<2,>=1.7.0 in /usr/lib/python3/dist-packages (from openai>=1.1.0->haystack-ai) (1.7.0)\n",
            "Requirement already satisfied: httpx<1,>=0.23.0 in /usr/local/lib/python3.10/dist-packages (from openai>=1.1.0->haystack-ai) (0.26.0)\n",
            "Requirement already satisfied: pydantic<3,>=1.9.0 in /usr/local/lib/python3.10/dist-packages (from openai>=1.1.0->haystack-ai) (1.10.14)\n",
            "Requirement already satisfied: sniffio in /usr/local/lib/python3.10/dist-packages (from openai>=1.1.0->haystack-ai) (1.3.0)\n",
            "Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (from haystack-bm25->haystack-ai) (1.23.5)\n",
            "Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->haystack-ai) (2.1.4)\n",
            "Requirement already satisfied: python-dateutil>=2.8.1 in /usr/local/lib/python3.10/dist-packages (from pandas->haystack-ai) (2.8.2)\n",
            "Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas->haystack-ai) (2023.3.post1)\n",
            "Requirement already satisfied: requests<3.0,>=2.7 in /usr/local/lib/python3.10/dist-packages (from posthog->haystack-ai) (2.31.0)\n",
            "Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from posthog->haystack-ai) (1.16.0)\n",
            "Requirement already satisfied: monotonic>=1.5 in /usr/local/lib/python3.10/dist-packages (from posthog->haystack-ai) (1.6)\n",
            "Requirement already satisfied: backoff>=1.10.0 in /usr/local/lib/python3.10/dist-packages (from posthog->haystack-ai) (2.2.1)\n",
            "Requirement already satisfied: idna>=2.8 in /usr/local/lib/python3.10/dist-packages (from anyio<5,>=3.5.0->openai>=1.1.0->haystack-ai) (3.6)\n",
            "Requirement already satisfied: exceptiongroup in /usr/local/lib/python3.10/dist-packages (from anyio<5,>=3.5.0->openai>=1.1.0->haystack-ai) (1.2.0)\n",
            "Requirement already satisfied: certifi in /usr/local/lib/python3.10/dist-packages (from httpx<1,>=0.23.0->openai>=1.1.0->haystack-ai) (2023.11.17)\n",
            "Requirement already satisfied: httpcore==1.* in /usr/local/lib/python3.10/dist-packages (from httpx<1,>=0.23.0->openai>=1.1.0->haystack-ai) (1.0.2)\n",
            "Requirement already satisfied: h11<0.15,>=0.13 in /usr/local/lib/python3.10/dist-packages (from httpcore==1.*->httpx<1,>=0.23.0->openai>=1.1.0->haystack-ai) (0.14.0)\n",
            "Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests<3.0,>=2.7->posthog->haystack-ai) (3.3.2)\n",
            "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests<3.0,>=2.7->posthog->haystack-ai) (2.0.7)\n"
          ]
        }
      ],
      "source": [
        "%%bash\n",
        "\n",
        "pip install haystack-ai"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "MhnSGxXWHNsD"
      },
      "source": [
        "### Enabling Telemetry\n",
        "\n",
        "Knowing you're using this tutorial helps us decide where to invest our efforts to build a better product but you can always opt out by commenting the following line. See [Telemetry](https://docs.haystack.deepset.ai/v2.0/docs/enabling-telemetry) for more details."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "ikIM1o9cHNcS"
      },
      "outputs": [],
      "source": [
        "from haystack.telemetry import tutorial_running\n",
        "\n",
        "tutorial_running(29)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "kS8rz9gGgMBb"
      },
      "source": [
        "## Creating a Simple Pipeline\n",
        "\n",
        "First, let's create a very simple pipeline that expects a `topic` from the user, and generates a summary about the topic with `google/flan-t5-large`. Feel free to modify the pipeline as you wish. Note that in this pipeline we are using a local model that we're getting from Hugging Face. We're using a relatively small, open-source LLM."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "odZJjD7KgO1g"
      },
      "outputs": [],
      "source": [
        "from haystack import Pipeline\n",
        "from haystack.components.builders import PromptBuilder\n",
        "from haystack.components.generators import HuggingFaceLocalGenerator\n",
        "\n",
        "template = \"\"\"\n",
        "Please create a summary about the following topic:\n",
        "{{ topic }}\n",
        "\"\"\"\n",
        "builder = PromptBuilder(template=template)\n",
        "llm = HuggingFaceLocalGenerator(model=\"google/flan-t5-large\",\n",
        "                                      task=\"text2text-generation\",\n",
        "                                      generation_kwargs={\n",
        "                                        \"max_new_tokens\": 150,\n",
        "                                        })\n",
        "\n",
        "pipeline = Pipeline()\n",
        "pipeline.add_component(name=\"builder\", instance=builder)\n",
        "pipeline.add_component(name=\"llm\", instance=llm)\n",
        "\n",
        "pipeline.connect(\"builder\", \"llm\")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "W-onTCXfqFjG",
        "outputId": "e81cd5ea-db66-4f0e-f787-5aed7a7b4692"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Climate change is a major threat to the planet.\n"
          ]
        }
      ],
      "source": [
        "topic = \"Climate change\"\n",
        "result = pipeline.run(data={\"builder\": {\"topic\": topic}})\n",
        "print(result['llm']['replies'][0])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "61r7hc1vuUMH"
      },
      "source": [
        "## Serialize the Pipeline to YAML\n",
        "\n",
        "Out of the box, Haystack supports YAML. Use `dumps()` to convert the pipeline to YAML:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "vYOEAesbrn4w",
        "outputId": "ef037904-79f4-46a4-c8e7-d03ea8dcb6c2"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "components:\n",
            "  builder:\n",
            "    init_parameters:\n",
            "      template: \"\\nPlease create a summary about the following topic: \\n{{ topic }}\\n\"\n",
            "    type: haystack.components.builders.prompt_builder.PromptBuilder\n",
            "  llm:\n",
            "    init_parameters:\n",
            "      generation_kwargs:\n",
            "        max_new_tokens: 150\n",
            "      huggingface_pipeline_kwargs:\n",
            "        device: cpu\n",
            "        model: google/flan-t5-large\n",
            "        task: text2text-generation\n",
            "        token: null\n",
            "      stop_words: null\n",
            "    type: haystack.components.generators.hugging_face_local.HuggingFaceLocalGenerator\n",
            "connections:\n",
            "- receiver: llm.prompt\n",
            "  sender: builder.prompt\n",
            "max_loops_allowed: 100\n",
            "metadata: {}\n",
            "\n"
          ]
        }
      ],
      "source": [
        "yaml_pipeline = pipeline.dumps()\n",
        "\n",
        "print(yaml_pipeline)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "0C7zGsUCGszq"
      },
      "source": [
        "You should get a pipeline YAML that looks like the following:\n",
        "\n",
        "```yaml\n",
        "components:\n",
        "  builder:\n",
        "    init_parameters:\n",
        "      template: \"\\nPlease create a summary about the following topic: \\n{{ topic }}\\n\"\n",
        "    type: haystack.components.builders.prompt_builder.PromptBuilder\n",
        "  llm:\n",
        "    init_parameters:\n",
        "      generation_kwargs:\n",
        "        max_new_tokens: 150\n",
        "      huggingface_pipeline_kwargs:\n",
        "        device: cpu\n",
        "        model: google/flan-t5-large\n",
        "        task: text2text-generation\n",
        "        token: null\n",
        "      stop_words: null\n",
        "    type: haystack.components.generators.hugging_face_local.HuggingFaceLocalGenerator\n",
        "connections:\n",
        "- receiver: llm.prompt\n",
        "  sender: builder.prompt\n",
        "max_loops_allowed: 100\n",
        "metadata: {}\n",
        "\n",
        "```"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "f9MknQ-1vQ8r"
      },
      "source": [
        "## Editing a Pipeline in YAML\n",
        "\n",
        "Let's see how we can make changes to serialized pipelines. For example, below, let's modify the promptbuilder's template to translate provided `sentence` to French:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "U332-VjovFfn"
      },
      "outputs": [],
      "source": [
        "yaml_pipeline = \"\"\"\n",
        "components:\n",
        "  builder:\n",
        "    init_parameters:\n",
        "      template: \"\\nPlease translate the following to French: \\n{{ sentence }}\\n\"\n",
        "    type: haystack.components.builders.prompt_builder.PromptBuilder\n",
        "  llm:\n",
        "    init_parameters:\n",
        "      generation_kwargs:\n",
        "        max_new_tokens: 150\n",
        "      huggingface_pipeline_kwargs:\n",
        "        device: cpu\n",
        "        model: google/flan-t5-large\n",
        "        task: text2text-generation\n",
        "        token: null\n",
        "      stop_words: null\n",
        "    type: haystack.components.generators.hugging_face_local.HuggingFaceLocalGenerator\n",
        "connections:\n",
        "- receiver: llm.prompt\n",
        "  sender: builder.prompt\n",
        "max_loops_allowed: 100\n",
        "metadata: {}\n",
        "\"\"\""
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "xLBtgY0Ov8nX"
      },
      "source": [
        "## Deseriazling a YAML Pipeline back to Python\n",
        "\n",
        "You can deserialize a pipeline by calling `loads()`. Below, we're deserializing our edited `yaml_pipeline`:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "OdlLnw-9wVN-"
      },
      "outputs": [],
      "source": [
        "from haystack import Pipeline\n",
        "from haystack.components.builders import PromptBuilder\n",
        "from haystack.components.generators import HuggingFaceLocalGenerator\n",
        "\n",
        "new_pipeline = Pipeline.loads(yaml_pipeline)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "eVPh2cV6wcu9"
      },
      "source": [
        "Now we can run the new pipeline we defined in YAML. We had changed it so that the `PromptBuilder` expects a `sentence` and translates the sentence to French:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "oGLi3EB_wbu6",
        "outputId": "ec6eae9f-a7ea-401d-c0ab-792748f6db6f"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "{'llm': {'replies': ['Je me félicite des capybaras !']}}"
            ]
          },
          "execution_count": 15,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "new_pipeline.run(data={\"builder\": {\"sentence\": \"I love capybaras\"}})"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## What's next\n",
        "\n",
        "🎉 Congratulations! You've serialzed a pipeline into YAML, edited it and ran it again!\n",
        "\n",
        "If you liked this tutorial, you may also enjoy:\n",
        "-  [Creating Your First QA Pipeline with Retrieval-Augmentation](https://haystack.deepset.ai/tutorials/27_first_rag_pipeline)\n",
        "\n",
        "To stay up to date on the latest Haystack developments, you can [sign up for our newsletter](https://landing.deepset.ai/haystack-community-updates?utm_campaign=developer-relations&utm_source=tutorial&utm_medium=serialization). Thanks for reading!"
      ]
    }
  ],
  "metadata": {
    "colab": {
      "provenance": []
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    },
    "language_info": {
      "name": "python"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
